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PREFACE 


The work described in this report was performed by the Science Data 
Analysis Division of the Jet Propulsion Laboratory. 
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ABSTRACT 


Obtaining accurate three-dimensional (3-D) measurement from a stereo 
pair of TV cameras is a task requiring camera modeling, calibration, and 
the matching of the two images of a real 3-D point on the two TV pictures. 

A system which models and calibrates the cameras and pairs the two 
images of a real-world point in the two pictures, either manually or auto- 
matically, was implemented at JPL, This system is operating and provides 
three-dimensional measurements resolution of ±1 mm at distances of about 
2 m. 


VI 
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I. INTRODUCTION 


Extracting 3-D measurement (X, Y, Z) from a stereo pair of two- 
dimensional images is a task important in various applications. It is used 
in orthodentistry (Refs. I and 2) as well ag in various automatic control and 
robot systems. The use of a computer as a component in a system which 
measures 3-D position of feature points has been tried before (Refs. 3-7), 

The computer solves the system equations * finds and stores the cameras' 
calibration parameters, and pairs the points in the t’A'o images. Our 
present system produces accurate 3-D measurement from on-line TV 
images. We have obtained this result by using two rigidly connected solid 
state TV cameras as sensors (GE TN-2000, which is a charge injection 
device TV camera) with highly linear 50-mm lenses. This results in a 
stable and linear two -camera system. In addition, we used an accurate 
and flexible camera calibration scheme and a linear camera model. 

The pairing of points in the two images is done either automatically, by 
the use of a correlation algorithm, or manually by an operator. The correla- 
tion algorithm works successfully due to accurate calibration and provides 
good matching which, in turn, provides accurate 3 -D measurements . The 
following describes the camera modeling, the camera's calibration scheme, 
the algorithm which pairs stereo images of a point correlation, and the 
equations which solve for the real-world position of the point whose two 
images in the two pictures were paired. 

II. THE CAMERA MODEL 

The light sensor in the CID cameras (Pig. 1) is a rectangular area 
containing a two-dimensional array of 188 X 244 light sensitive elements . 

The video output (the TV picture) of each camera is digitized so that the 
picture appears to the computer (in our case, SPC-16/85 with a 64K 16-bit 
core) as a two-dimensional array 188 X 244 of 8-bit numbers. The elements 
of that array are indexed by (i, j) where 0 ^ i s 243, 0 ^ j s 187. This array 
is called the gray level array and the values of these numbers correspond to 
the brightness of the image. 
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The calibration parameters allow matching of each element (i, j) of the 

image with a ray C + X • R(i, j) 0 ^ X s co in the real 3-D world. So that if a 

* * 

real-world point P is imaged on picture element (i, j), it must be on that ray; 

that is, P = C + X ' R(i, j) for some X > 0. We assume that the cameras are 

geometrically linear, an assumption which is sufficient, considering the 

linear sensor array and the high quality lens that is used. The assumption 

of linearity means that there are C, , H, , A, and V. for the first camera and 

iii i 

^2* ^2' "^2' ^2 second camera such that if a real -world point P 

is imaged on image coordinates (1^^, J^) in the first camera and on image 
coordinate {l^^t ^ 2 ^ second camera, then: 



The semantic meaning of these parameter vectors is as follows: C, 
and C 2 are the positions of the focal center of the first and the second camera 
correspondingly, measured' in the external coordinate system. (Hence, 

P - C is a vector from the focal center of the camera towards P). A (A^) 
is a unit vector in the direction in which the first (second) camera is pointed. 
It is thus the direction of the symmetry axis of the lens as measured in the 
external coordinate system. 

Hj^(H 2 ) is called the horizontal vector of the first (second) camera. 
and H 2 3.re not unit vectors and are iiot perpendicular to A. 


EEPEODUCIBIUTy OF THE 
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ViCV^,) is called the vertical vector of the first (second) camera. It is 
not a unit vector and it is not necessarily perpendicular to either A, (A.,) 
orH^(H2). 

The meaning of the H and V vectors is defined by Eqs. 1-4. 

Hi - (H^, "^l^ Aj^ is a vector in the real-world direction of the line of the 
sensor elements for which (j = 0), i = 0, . , . , 187 in the first camera. These 
elements produce the horizontal line on the first image. - (Vj^Aj^)Aj^ is 
a vector in the direction of the sensor *>!ements on the vertical line (i=:0) 
j = 0, . . . , 243 in the sensor array, have the identical meaning in the 

second camera. 


III, THE CAEIB RATION SYSTEM 

The calibration process attempts to find the parameters C, A, H and 
V of both cameras. This is done as follows. 


A set of n known points P^, , P in the real world are imaged on the 

camera, and the picture coordinate on which they are images (i^, 

(in^ in^ obtained and stored. In our case, it is done by moving the robot 

arm in front of the camera (Fig, 1) and taking pictures of the arm, A 

pattern recognition algorithm finds automatically a feature point of the arm 

in each picture. The arm is calibrated so that the actual coordinate P of 

m 

the m-th feature point in the real world is known. The (i , i ), which is 

m •'m'' 

the image of on the screen, is found by the pattern recognition system. 

This set of matches of P^ with (i , j ) is used to solve for C, A, H and V 
by substituting P^, i^, into Eq. (1 ) . This yields 


i 

m 


h) 



which is equivalent to 


V • (Pm - C. A) 


H - C, H) 
m 
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or 



and let 

= (C, H) and = (C, A) 

Then we get the following linear systeni of n- equations in the 8 unlmowns 
”^1' "^2^ ^3' ^1’ ^2^ ^3’ ^A’ 


(P , • i ) A, + (P „ 
' m, 1 m 1 ' m, 2 

^m^ ^2 + ^^m, 3 ' W ^3 

P ry • - P « 

m, 2 2 m, 3 

^3 " ^m ■ ^A ^ 

where 



( 6 ) 


m = 1 , • • • , n 


This linear system of n equation by 8 variables is reduced to (n - 4) 
equations of 3 variables bv setting A^ = 1 and setting all coefficients of 

and to 0 by combining 5 equations at a time. We solve for the 
optimal A^# A^, by the standard optimization approach to the solution 
of an n X m linear system SX = B where S is an n by m matrix and n > m. 
We use the fact that the XqCB.^ that satisfies 

roin ||S • S - S|l^ = llsx -B|| 

XeR^ 

is the s ?,me as the X^, which solves 
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«T> -K 

S S X - 

Q 


T 

B. 


Then (i^A^, A^) is scaled to one (j| A |1 = l) and ¥{^, are 

computed from the original n equations system after the known %. and 
are substituted in the equation, Similarly, Cy, V^, are computed 

by using Eq. (2) as follows: 


^m 


I ^ •+ \ 

( P - c,,v 

\ m 1 / 

/ -► -V V 

(Pm- A) 


(7) 


Next, C = (Cj^, C^, C^) is computed by solving the system; 


> 

tl 

o 

' A) 

-1 

^2 ' 


+ C3 . 

' A 


' Hi 

+ 

<=2 ■ 

' ^2 

+ <=3 ■ 

■ H. 

< 

II 

o 

' 

,+ 

^2 ■ 

^2 

+ C3 . 

• V 


Note that Ca- and Cy are known at this point. 

After C, A, H, and V were computed, the differences 

, (^xn - H) 



are computed. These values supply information as to the adequacy of the 
computed camera model; e.g., how much the computed (i, j) deviate from 
the actual (i ). At present, the deviation does not exceed errors 

anticipated from the cameras' resolution coupled with the anticipated errors 
resulted from errors in the arm calibration parameters which are used to 
compute the feature point position (the 
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IV. PROJECTING PICTURE ELEMENTS INTO THE REAL WORLD 


The cameras' calibration parameters are used to project picture points 
onto the real world, A real point P imaged on point (1^^, Jj^) in the first 
camera will satisfy the relation in Eq, (1) and Eq, (2). These equations can 
be inverted as follows to solve for possible P where Ij^ and Jj^ are given. 


Eq, 1 


II - 


(P- ^1- Hi) 

(p- C^, Aj) 


Eq. 2 


= 


(P- V,) 

(P - Cj, A ) 


which is equivalent to 


(P - Cj, - l^A^) = 0 




and (P - C^, V^ - J^A^) - 0 


Hence, is perpendicular to both H^^-I^^A^ and therefore 

(V.-Jl-Ai) X (fii-I^A^) 


P-C, 


or 


P = Cj^ + X • Rj^(Ij^, Jj^), 


X > 0 


where Rj(Ij^Jj^) 


(Vf- JfAi) X (Hj^-Ij^A^) and Rj^(Ij^, Jjl^) is a unit vector. 


In other words, given that a point P is imaged on (I, , J,) in the first 
camera, we know that P must lie on the real-world line L^^: 


L^ = Cj^ + X ^ R^(Ij, Jj), X< +» 


where 


^RODUOTlTiTY OP Tin 


R^(I^, (V^ + J^ A;^) X (H^ + I^A^) 


and Rj^ normalized to 1. 
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Usually we l<:now the bounds on the distance of P from the focal center 
of the cameras (in our case, a point typically will be at a distance varying 
from 0, 5 m in front of the camera to (+®). Two points are computed 
on the line 2^^ + X • Jj^), one 0, 5 m in front of the first camera and one 

at infinity. These two points are projected on the image of the second camera 
using Eqs. (3) and (4). The near one on T^ = (I^, J^) and the far one on 
T| “ {I^, J^). The point P itself will be imaged somewhere on the straight 
line in the two-dimensional picture connecting the two points T^ 
and T^. The exact image of P on that line in the second image is found man- 
ually (using a cursor) or automatically (using the correlation algorithm 
described in the next section). Once the(l2> J2) which are the picture coordi- 
nates on which P is imaged in the second camera are found, the position of 
P can be computed. 


(I2, J2) define a line in space, 




by tlie same mechanism which Lj was defined. The point P is computed 
to be the 3-D point for which the sum of its distances to the two lines is 
minimized. We do not use the intersector of the two lines because it may not 
exist due to numerical errors in calibration. That is, the computed P will 
be a three-dimensional point which satisfies; 

^min ^I|X - - (X - C^, R^(I^, Jj))^ + II - 02!!^ 

XeR^ 

- (X - C2, R2(l2, J2)) j = llF - Cjl - (P - C^, R^dj^, J^))‘" 


+ ||P - 02!!^ - (5 - C2, R2(l2, 

The reader is reminded that 

II X - c || - (X - C, R)'" 

where )1 R |[ = 1 is the distance between X and the line C + X R, 
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Hence, the problem of extracting the 3-D coordinates of a point is 
reduced to finding and matching the two images of that point on the two video 
pictures . 

V, THE stereo correlation ALGORITHMS 


This section deals with the problem of matching the two images of a 
real-wprld point P s (X, Y, E), which is in the field of view of both cameras, 
The method used is correlation of grey levels in the right and left images. 


Let Tj^ = (I|^, be the image of P in the right camera and T 2 (I^, J^) 
be the image of P in the left camera. As described above, T^ and T, 
define rays from the right and lef*. cameras to the point P. The 
intersection of these rays gives the coordinates (X, Y, Z) of the point P. 

The problem of correlation is to find the points T, and T_ which are 
images of the same point P. 

In the current implementation, T^^ is selected with a cursor on the 
Ramtek display of the ' ’.jhi .tinage (see Fig. 3), A set of N points called 
a mask is selected from a small window centered at T^^. Currently, two 
types of masks are used. One is a set of concentric diamonds Dq, . . , , 
where Dq = Tj and 


D. p 

X 



I^-I 





Typical vaides for d^ are d^ - 1, d 2 == 2, d^ - 4, d^ = 8, (N = 61) (see Fig, 4). 

The second type of mask (see Fig. 5) consists of 4 line segments 
defined by an integer k as follows; 


(1) 

horizontal 

(I^-k, J^) to (I^+k, J^) 

(2) 

vertical 

(I, Jj-k) to (I, J^+k) 

(3) 

0 

in 

(I^-k, J^-k) to (Ij^+k, J^+k) 

(4) 

1 

0 

(I^ -k, J^+k) to (I^+k, J^-k) 


A typical value of k is 8 (N = 65), 
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In either case, the mask representation is generalized as two sequences 
of M displacements AI^ and AJ^ where i = 1, . , , , N. Thus, each point m^ of 
the mask can be expressed as (I+AL, J+AJ.) with the mask centered at an 
arbitrary point (I, J). 


With the mask centered at Tj, the right image is sampled to find the 
grey level at each point m^ of the masi^^ These values are stored as a 
third N element array. 


To find T^, a search is initiated along a line segment L 2 in the left 
image. JLy is determined, as mentioned above, by projecting the end points 
of a segment in the real 3-D world on the ray + X * J^) 0 S X < co 

onto the left image. This real-world segment is chosen according to the 
context of the point P. For instance, if P is the centroid of a rock to be 
picked up by the manipulator, then P i.s approximately 2 meters from the 
camera center, so the endpoints on the real ^ ^rld segment will be taken as 
the points 1.5 m and 2 .5 m from the right camera on R^. 


For each point T.^ on L^, the following happens: 

(a) With the mask centered at Tj^^ - (I^, J^), the left image is 
sampled to find the grey level at each point 

m^, = (Ij^+AT, J^+AP) of the^ mask on the left image. 

(b) The correlation coefficient is computed by 


( 1 ) 


N 

(Xj - X) (Y. - Y) 
i=l 


where 


X = 


/N 

N 

i/E (h - 


1 / i=l 

i=l 

N 

N 



i 1 .y. 

i=l 


-1 < C < 1 
k 


'P - Y)" 


N 


N 


An equivalent form of (1) which requires the lesser amount 
of computations is 
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The X.' s remain constant while is fixed and we search for the 
appropriate T^. Hence, we can maximize expression (3) in the search 
instead of expression (2) and save some compute time; 


(3) 


D » |H| 

N / N 

i=l \i^l 



where 



N 



i=.l 


Z. = N • X. 
1 1 


lFY of TEffil 
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and the Zi>s which depend only on are maintained through the search 
and do not have to be recomputed. Hence the time spent on each correlation 
is consumed almost exclusively in computing 


N 


N 


N 




i“l 


isl 


■i=l 


for each candidate ^k- 


The correlation coefficient will be equal to 1 if there is a perfect 
linear equivalence between Xj^ and Yj for 1 < i < N, and less than 1 other- 
wise, Thus, the point T^ should correspond to the maximum value of C^, 
Since there may be several points with grey level distributions similar to 
that around the actual point of interest Tj, additional steps are taken to 
increase the probability of finding the correct point T 2 . 


If the values of plotted as a function of T^, 

look like that of Fig. 6. 


the curve might 


In Fig. 6, T^ should correspond to a local maximum on the curve. As 
the are generated, a record is kept of the (at most) four greatest local 
maxima which occur, as well as the absolute maximum. The correlation at 
each maximum point is recomputed using a new mask consisting of all 
points in a 15 X 15 square. Then T^ is taken as the point which gives the 
highest correlation with this mask, 

VI. MASK SELECTION 

For the correlation method to be effective, there must be significant 
information in the grey levels around the point T^ covered by the mask so that 
the correlation value at T-, will be significantly greater than at other points 
in the neighborhood of T 2 . Therefore, before correlation on both images is 
started, two tests are applied to T^ to select the proper size mask (Ref, 7). 

The first involves comparing the variance of the grey levels in 

the mask at T-, to the noise level variance Y of the camera. If V < 3 • V . 

J- o m e 

the point is considered unacceptable for correlation because the area in 
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the mask is too homogeneous and there is not sufficient information to 
discriminate hetween points. 

The second test is autocorrelation. This involves computing the 
correlation of a mask centered at T^ with masks centered at points in the 
neighborhood of Tj^ in the same (right) image. The neighborhood used is the 
line segment (Ij^ - 4, to (1^ + 4, If the correlation at the neighbor- 

hood points is significantly less than 1, then correlation with the left image 
proceeds. A successful autocorrelation test usually implies the existence 
of a local maximum on the correlation curve (Fig. 6) at the desired match 
point Ty in the left image. 

If either of these tests fails, the mask is expanded in size until the 
area covered by the mask exceeds the boundaries of the homogeneous region 
containing the point T^. At this point, the variance and autocorrelation tests 
will be satisfied. The mask is expanded K-1 times by factors of 2, 3, .... K 
times the original size, vintil an acceptable mask is found. Presently the 
value of K is 7. If an acceptable mask is not found after K-1 expansions, no 
correlation will be attempted. 
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Fig. 1. The hardware configuration: 
the two cameras and the arm 
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Fig. 2. The linear model 
of the camera pair 



Fig. 3. Two-image stereo display with cursor overlay 
on a pair of matcliing points 
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g. 4, Diamond mask 


Fig, 6, Values of plo 
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Fig, 5. Four-line segment mask 



as a function of T 


